Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)

Identifieur interne : 000581 ( Main/Exploration ); précédent : 000580; suivant : 000582

Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)

Auteurs : Günter Mühlberger [Autriche]

Source :

RBID : Pascal:11-0198412

Descripteurs français

English descriptors

Abstract

OCR recognition is a key technology which cannot be circumvented when systematically digitizing historical newspapers. Although often achieving a word accuracy of only 80% or less for newspapers of the 19th and early 20th century, these imperfect files nevertheless provide a basis for a number of interesting applications - from full-text searching to indexing by search engines and online correction by users. However, in comparison to traditional digitization projects, the use of OCR requires a fundamental change of thinking during the project planning, the design of the workflow, the implementation of quality control, and in the designing of long-term preservation and presentation of digitized material on the Internet.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="GER" level="a">Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)</title>
<author>
<name sortKey="Muhlberger, Gunter" sort="Muhlberger, Gunter" uniqKey="Muhlberger G" first="Günter" last="Mühlberger">Günter Mühlberger</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Universitäts- und Landesbibliothek Tirol, Abteilung für Digitalisierung und elektronische Archivierung, Innrain 52,</s1>
<s2>6020 Innsbruck</s2>
<s3>AUT</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<wicri:noRegion>6020 Innsbruck</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">11-0198412</idno>
<date when="2011">2011</date>
<idno type="stanalyst">PASCAL 11-0198412 INIST</idno>
<idno type="RBID">Pascal:11-0198412</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000146</idno>
<idno type="stanalyst">FRANCIS 11-0198412 INIST</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000157</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000627</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000135</idno>
<idno type="wicri:doubleKey">0044-2380:2011:Muhlberger G:digitalisierung:historischer:zeitungen</idno>
<idno type="wicri:Area/Main/Merge">000587</idno>
<idno type="wicri:Area/Main/Curation">000581</idno>
<idno type="wicri:Area/Main/Exploration">000581</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="GER" level="a">Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)</title>
<author>
<name sortKey="Muhlberger, Gunter" sort="Muhlberger, Gunter" uniqKey="Muhlberger G" first="Günter" last="Mühlberger">Günter Mühlberger</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Universitäts- und Landesbibliothek Tirol, Abteilung für Digitalisierung und elektronische Archivierung, Innrain 52,</s1>
<s2>6020 Innsbruck</s2>
<s3>AUT</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<wicri:noRegion>6020 Innsbruck</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Zeitschrift für Bibliothekswesen und Bibliographie</title>
<title level="j" type="abbreviated">Z. Bibliothekswes. Bibliogr.</title>
<idno type="ISSN">0044-2380</idno>
<imprint>
<date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Zeitschrift für Bibliothekswesen und Bibliographie</title>
<title level="j" type="abbreviated">Z. Bibliothekswes. Bibliogr.</title>
<idno type="ISSN">0044-2380</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Digitizing</term>
<term>Information communication technology</term>
<term>Optical character recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Technologie information communication</term>
<term>Reconnaissance optique caractère</term>
<term>Numérisation</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">OCR recognition is a key technology which cannot be circumvented when systematically digitizing historical newspapers. Although often achieving a word accuracy of only 80% or less for newspapers of the 19th and early 20th century, these imperfect files nevertheless provide a basis for a number of interesting applications - from full-text searching to indexing by search engines and online correction by users. However, in comparison to traditional digitization projects, the use of OCR requires a fundamental change of thinking during the project planning, the design of the workflow, the implementation of quality control, and in the designing of long-term preservation and presentation of digitized material on the Internet.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Autriche</li>
</country>
</list>
<tree>
<country name="Autriche">
<noRegion>
<name sortKey="Muhlberger, Gunter" sort="Muhlberger, Gunter" uniqKey="Muhlberger G" first="Günter" last="Mühlberger">Günter Mühlberger</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000581 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000581 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:11-0198412
   |texte=   Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024